计算机视觉中自注意力构建块的PyTorch实现

您所在的位置:网站首页 pytorch 实现自注意力机制 计算机视觉中自注意力构建块的PyTorch实现

计算机视觉中自注意力构建块的PyTorch实现

2023-03-11 15:17| 来源: 网络整理| 查看: 265

导读

一个非常好用的git仓库,封装了非常全面的计算机视觉中的自注意力构建块,直接调用,无需重复造轮子了。

git仓库地址:https://github.com/The-AI-Summer/self-attention-cv

用einsum和einops在PyTorch中实现计算机视觉的自我注意机制。专注于计算机视觉自注意模块。

使用 pip 安装$ pip install self-attention-cv

如果你没有GPU,最好是在环境中预装好pytorch。

相关的文章How Attention works in Deep LearningHow Transformers work in deep learning and NLPHow the Vision Transformer (ViT) works in 10 minutes: an image is worth 16x16 wordsUnderstanding einsum for Deep learning: implement a transformer with multi-head self-attention from scratchHow Positional Embeddings work in Self-Attention示例代码Multi-head attentionimport torch from self_attention_cv import MultiHeadSelfAttention model = MultiHeadSelfAttention(dim=64) x = torch.rand(16, 10, 64) # [batch, tokens, dim] mask = torch.zeros(10, 10) # tokens X tokens mask[5:8, 5:8] = 1 y = model(x, mask) Axial attentionimport torch from self_attention_cv import AxialAttentionBlock model = AxialAttentionBlock(in_channels=256, dim=64, heads=8) x = torch.rand(1, 256, 64, 64) # [batch, tokens, dim, dim] y = model(x) Vanilla Transformer Encoderimport torch from self_attention_cv import TransformerEncoder model = TransformerEncoder(dim=64,blocks=6,heads=8) x = torch.rand(16, 10, 64) # [batch, tokens, dim] mask = torch.zeros(10, 10) # tokens X tokens mask[5:8, 5:8] = 1 y = model(x,mask) Vision Transformer使用ResNet50主干做图像分类import torch from self_attention_cv import ViT, ResNet50ViT model1 = ResNet50ViT(img_dim=128, pretrained_resnet=False, blocks=6, num_classes=10, dim_linear_block=256, dim=256) # or model2 = ViT(img_dim=256, in_channels=3, patch_dim=16, num_classes=10,dim=512) x = torch.rand(2, 3, 256, 256) y = model2(x) # [2,10] 使用Vision Transformer编码器的Unet的复现import torch from self_attention_cv.transunet import TransUnet a = torch.rand(2, 3, 128, 128) model = TransUnet(in_channels=3, img_dim=128, vit_blocks=8, vit_dim_linear_mhsa_block=512, classes=5) y = model(a) # [2, 5, 128, 128] Bottleneck Attention blockimport torch from self_attention_cv.bottleneck_transformer import BottleneckBlock inp = torch.rand(1, 512, 32, 32) bottleneck_block = BottleneckBlock(in_channels=512, fmap_size=(32, 32), heads=4, out_channels=1024, pooling=True) y = bottleneck_block(inp) 位置嵌入可用1D Positional Embeddingsimport torch from self_attention_cv.pos_embeddings import AbsPosEmb1D,RelPosEmb1D model = AbsPosEmb1D(tokens=20, dim_head=64) # batch heads tokens dim_head q = torch.rand(2, 3, 20, 64) y1 = model(q) model = RelPosEmb1D(tokens=20, dim_head=64, heads=3) q = torch.rand(2, 3, 20, 64) y2 = model(q) 2D Positional Embeddingsimport torch from self_attention_cv.pos_embeddings import RelPosEmb2D dim = 32 # spatial dim of the feat map model = RelPosEmb2D( feat_map_size=(dim, dim), dim_head=128) q = torch.rand(2, 4, dim*dim, 128) y = model(q) 参考文献Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. arXiv preprint arXiv:1706.03762.Wang, H., Zhu, Y., Green, B., Adam, H., Yuille, A., & Chen, L. C. (2020, August). Axial-deeplab: Stand-alone axial-attention for panoptic segmentation. In European Conference on Computer Vision (pp. 108-126). Springer, Cham.Srinivas, A., Lin, T. Y., Parmar, N., Shlens, J., Abbeel, P., & Vaswani, A. (2021). Bottleneck Transformers for Visual Recognition. arXiv preprint arXiv:2101.11605.Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Houlsby, N. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

—END—

英文原文:https://github.com/The-AI-Summer/self-attention-cv

个人微信(如果没有备注不拉群!)请注明:地区+学校/企业+研究方向+昵称 下载1:何恺明顶会分享 在「AI算法与图像处理」公众号后台回复:何恺明,即可下载。总共有6份PDF,涉及 ResNet、Mask RCNN等经典工作的总结分析 下载2:终身受益的编程指南:Google编程风格指南 在「AI算法与图像处理」公众号后台回复:c++,即可下载。历经十年考验,最权威的编程规范!下载3 CVPR2021 在「AI算法与图像处理」公众号后台回复:CVPR,即可下载1467篇CVPR 2020论文 和 CVPR 2021 最新论文

靓仔,靓妹 点亮在看吧



【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3